perf: faster model resolution, JSON decoding, and request snapshotting#413
Conversation
Reduce per-request CPU and allocations on the gateway hot path. Changes are behavior-preserving for all valid input; the one intentional difference is documented and tested. Model resolution (O(1)): - Add a lazy provider-selector index to the registry (qualifiedByName / qualifiedByType), invalidated at the existing single cache-invalidation point. - Route qualified-selector resolution through it via an optional qualifiedSelectorResolver interface, with the catalog scan kept as a fallback for non-indexed lookups and raw slash-shaped model IDs. - Resolution is now O(1) and constant in catalog size (was O(N), copying the full catalog several times per request): ~31us/164KB -> ~0.8us/0.3KB at 300 models. Deduplicated the redundant name/type scans in resolveQualifiedSelector. JSON decoding (goccy/go-json): - Migrate internal/ + cmd/ from encoding/json to github.com/goccy/go-json (drop-in; package is named json). gjson is unchanged. - ~3.8x faster realistic chat-body decode with fewer allocations. - goccy is slightly more lenient than encoding/json on a couple of malformed inputs (leading-zero numbers; malformed values in skipped passthrough fields). Accepted under the gateway's accept-generously principle and pinned by TestDecoderLeniencyIsBounded. - Drop the redundant gjson.ValidBytes walk in extractUnknownJSONFields (callers already validate via the preceding Unmarshal). Request snapshot allocations: - Add NewRequestSnapshotWithOwnedMaps so ingress capture owns the freshly built route/query/trace maps and body, cloning only the live header map. - Add HeadersView (zero-copy) and route read-only callers to it. - Remove the now-superseded NewRequestSnapshotWithOwnedBody constructor. Perf harness: - Make the gateway hot-path benchmark exercise the real Router + populated catalog (it previously bypassed routing, giving false confidence) and add a guard case for it. Add a resolution micro-benchmark. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Too many files changed for review. ( |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR replaces Changesgo-json migration, RequestSnapshot refactor, O(1) model resolution, and benchmarks
Sequence Diagram(s)Model resolution now supports an optional O(1) fast path when sequenceDiagram
participant req as HTTP Request
participant router as Router
participant resolver as ModelRegistry<br/>qualifiedSelectorResolver
participant fallback as resolveProviderOwnedRawSelector
participant found as core.ModelSelector
req->>router: resolveQualifiedSelector(segment, modelID)
router->>resolver: ResolveProviderSelector(segment, modelID)
alt fast path available
resolver->>resolver: RLock → lookupSelectorIndex
alt index cache hit
resolver-->>router: ModelSelector, ok=true
else index miss
resolver->>resolver: WLock → buildSelectorIndexLocked
resolver->>resolver: populate qualifiedByName, qualifiedByType
resolver-->>router: ModelSelector, ok=true/false
end
else no resolver (fallback)
router->>fallback: scan catalog for match
fallback-->>router: ModelSelector, ok=true/false
end
router-->>found: matched provider selector
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/core/request_snapshot_test.go (1)
74-106: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winExpand owned-maps test coverage to include map/header semantics.
The renamed test currently pins only captured-body ownership. Please also assert that owned route/query/trace maps are not cloned, while headers are still cloned, to lock in the full constructor contract.
💡 Suggested test extension
func TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes(t *testing.T) { + routeParams := map[string]string{"provider": "openai"} + queryParams := map[string][]string{"limit": {"5"}} + headers := map[string][]string{"X-Test": {"a"}} + traceMetadata := map[string]string{"Traceparent": "trace-1"} rawBody := []byte(`{"model":"gpt-5-mini"}`) snapshot := NewRequestSnapshotWithOwnedMaps( "POST", "/v1/chat/completions", - nil, - nil, - nil, + routeParams, + queryParams, + headers, "application/json", rawBody, false, "req-123", - nil, + traceMetadata, "/team/a", ) @@ if &clonedBody[0] == &rawBody[0] { t.Fatal("CapturedBody returned owned bytes directly, want defensive copy") } + + routeParams["provider"] = "anthropic" + if got := snapshot.GetRouteParams()["provider"]; got != "anthropic" { + t.Fatalf("GetRouteParams provider = %q, want anthro pic (owned map)", got) + } + queryParams["limit"][0] = "99" + if got := snapshot.GetQueryParams()["limit"][0]; got != "99" { + t.Fatalf("GetQueryParams limit = %q, want 99 (owned map)", got) + } + traceMetadata["Traceparent"] = "trace-2" + if got := snapshot.GetTraceMetadata()["Traceparent"]; got != "trace-2" { + t.Fatalf("GetTraceMetadata Traceparent = %q, want trace-2 (owned map)", got) + } + headers["X-Test"][0] = "mutated" + if got := snapshot.GetHeaders()["X-Test"][0]; got != "a" { + t.Fatalf("GetHeaders X-Test = %q, want a (cloned headers)", got) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/core/request_snapshot_test.go` around lines 74 - 106, The TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes test currently only validates captured body ownership but does not test the complete contract of the NewRequestSnapshotWithOwnedMaps constructor regarding map and header semantics. Add assertions to verify that route, query, and trace maps passed to NewRequestSnapshotWithOwnedMaps are not cloned by the snapshot (confirming ownership is taken), while headers should still be defensively cloned to prevent external mutations. Create sample maps for these fields, pass them through the constructor, retrieve them via appropriate accessor methods, and verify the pointer equality or inequality as needed to confirm the ownership vs cloning behavior for each map type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/providers/registry.go`:
- Around line 174-183: The code builds selector-index keys using potentially
untrimmed values for publicName and info.ProviderType, while the lookup paths
(lines 126-127) use trimmed inputs. Apply strings.TrimSpace() to normalize
publicName when it is assigned from info.ProviderName, and also apply
strings.TrimSpace() to info.ProviderType before using it in the typeKey
construction. This ensures that keys built during registration match keys used
during lookups even when config/provider metadata contains whitespace padding.
In `@internal/providers/resolve_bench_test.go`:
- Around line 33-35: The benchmark test case labels are mislabeled due to
integer truncation in the buildBenchRegistry call. In the benchmark loop that
iterates over slice values (50, 300, 1000), the calculation n/6 for the
per-provider count causes truncation, resulting in actual model counts that
differ from the label (models=50 actually benchmarks 48 models, models=1000
actually benchmarks 996 models). Fix this by either changing the loop values to
numbers divisible by 6 such that when multiplied back (divideCount * 6) they
equal the intended model count, or recalculate the label to show the actual
number of models being created by computing divideCount * 6 and using that value
in the b.Run label instead of n. Apply the same fix to the second benchmark loop
also mentioned in the comment.
In `@tests/perf/README.md`:
- Around line 26-29: The README.md file contains outdated performance
documentation in the section describing the routed path (around lines 26-29).
The current text still references repeated full-catalog copies per request, but
the actual implementation now uses O(1) selector-index behavior where resolution
is computed once per request and reused. Update the routed path performance
explanation to accurately reflect this current behavior by removing or revising
the outdated statement about order of magnitude allocations from repeated
catalog copies, and instead describe how resolution is now computed once and
reused, which provides the O(1) performance characteristics referenced in the
perf-guard commentary.
---
Outside diff comments:
In `@internal/core/request_snapshot_test.go`:
- Around line 74-106: The
TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes test currently
only validates captured body ownership but does not test the complete contract
of the NewRequestSnapshotWithOwnedMaps constructor regarding map and header
semantics. Add assertions to verify that route, query, and trace maps passed to
NewRequestSnapshotWithOwnedMaps are not cloned by the snapshot (confirming
ownership is taken), while headers should still be defensively cloned to prevent
external mutations. Create sample maps for these fields, pass them through the
constructor, retrieve them via appropriate accessor methods, and verify the
pointer equality or inequality as needed to confirm the ownership vs cloning
behavior for each map type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cbf95ee6-7e26-4f2d-b8a7-0d7f47aad549
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (125)
cmd/gomodel/health.gocmd/recordapi/main.gogo.modinternal/admin/handler_guardrails.gointernal/admin/handler_live.gointernal/aliases/batch_preparer.gointernal/anthropicapi/request.gointernal/anthropicapi/response.gointernal/anthropicapi/stream.gointernal/anthropicapi/types.gointernal/app/app.gointernal/auditlog/auditlog.gointernal/auditlog/entry_capture.gointernal/auditlog/middleware.gointernal/auditlog/reader_postgresql.gointernal/auditlog/reader_sqlite.gointernal/batch/store.gointernal/cache/modelcache/local.gointernal/cache/modelcache/modelcache.gointernal/cache/modelcache/redis.gointernal/conversationstore/store.gointernal/conversationstore/store_memory.gointernal/core/audio.gointernal/core/batch.gointernal/core/batch_json.gointernal/core/batch_preparation.gointernal/core/chat_content.gointernal/core/chat_json.gointernal/core/conversations.gointernal/core/embeddings_encoding.gointernal/core/embeddings_json.gointernal/core/errors.gointernal/core/json_fields.gointernal/core/json_fields_test.gointernal/core/message_json.gointernal/core/request_snapshot.gointernal/core/request_snapshot_test.gointernal/core/responses.gointernal/core/responses_json.gointernal/core/semantic_canonical.gointernal/core/types.gointernal/core/usage_json.gointernal/embedding/embedding.gointernal/gateway/batch_usage.gointernal/guardrails/batch_rewrite.gointernal/guardrails/batch_rewrite_test.gointernal/guardrails/definitions.gointernal/guardrails/executor.gointernal/guardrails/responses_message_apply.gointernal/guardrails/store_mongodb.gointernal/live/broker.gointernal/llmclient/client.gointernal/modeldata/fetcher.gointernal/modeloverrides/batch_preparer.gointernal/modeloverrides/store.gointernal/modeloverrides/store_postgresql.gointernal/modeloverrides/store_sqlite.gointernal/pricingoverrides/store.gointernal/pricingoverrides/store_postgresql.gointernal/pricingoverrides/store_sqlite.gointernal/providers/anthropic/anthropic.gointernal/providers/anthropic/batch.gointernal/providers/anthropic/chat.gointernal/providers/anthropic/chat_stream.gointernal/providers/anthropic/request_translation.gointernal/providers/anthropic/responses.gointernal/providers/anthropic/types.gointernal/providers/bailian/bailian.gointernal/providers/batch_results_file_adapter.gointernal/providers/bedrock/chat.gointernal/providers/bedrock/chat_stream.gointernal/providers/chat_stream_normalize.gointernal/providers/deepseek/deepseek.gointernal/providers/gemini/gemini.gointernal/providers/gemini/native.gointernal/providers/gemini/native_stream.gointernal/providers/googlecommon/auth.gointernal/providers/ollama/ollama.gointernal/providers/openai/openai.gointernal/providers/registry.gointernal/providers/registry_metadata.gointernal/providers/resolve_bench_test.gointernal/providers/responses_adapter.gointernal/providers/responses_content.gointernal/providers/responses_converter.gointernal/providers/responses_input.gointernal/providers/responses_output.gointernal/providers/responses_output_state.gointernal/providers/router.gointernal/providers/vertex/vertex.gointernal/providers/xai/xai.gointernal/providers/xiaomi/audio.gointernal/responsecache/responsecache.gointernal/responsecache/semantic.gointernal/responsecache/simple.gointernal/responsecache/sse_validation.gointernal/responsecache/stream_cache.gointernal/responsecache/stream_cache_chat.gointernal/responsecache/stream_cache_responses.gointernal/responsecache/vecstore_pinecone.gointernal/responsecache/vecstore_qdrant.gointernal/responsecache/vecstore_weaviate.gointernal/responsestore/store.gointernal/server/conversation_responses.gointernal/server/internal_chat_completion_executor.gointernal/server/native_conversation_service.gointernal/server/native_response_service.gointernal/server/request_selector_peek.gointernal/server/request_snapshot.gointernal/server/response_input_items.gointernal/server/translated_inference_service.gointernal/streaming/observed_sse_stream.gointernal/usage/audio.gointernal/usage/cost.gointernal/usage/extractor.gointernal/usage/reader_postgresql.gointernal/usage/reader_sqlite.gointernal/usage/realtime.gointernal/usage/recalculate_pricing.gointernal/usage/store_sqlite.gointernal/workflows/store_postgresql.gointernal/workflows/store_sqlite.gointernal/workflows/types.gotests/perf/README.mdtests/perf/hotpath_test.go
| covers the per-request resolution path. This routed path currently allocates an | ||
| order of magnitude more per request because resolution re-copies the full model | ||
| catalog several times; its guard ceilings should tighten significantly once | ||
| resolution is computed once per request and reused. |
There was a problem hiding this comment.
README performance explanation is outdated for the routed path.
Line 26-Line 29 still describe repeated full-catalog copies per request, but the routed perf-guard commentary now assumes O(1) selector-index behavior. This mismatch can mislead perf investigations.
Suggested fix
-`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` +
-`ModelRegistry` (the production shape) with a representative catalog, so it
-covers the per-request resolution path. This routed path currently allocates an
-order of magnitude more per request because resolution re-copies the full model
-catalog several times; its guard ceilings should tighten significantly once
-resolution is computed once per request and reused.
+`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` +
+`ModelRegistry` (the production shape) with a representative catalog, so it
+covers the per-request resolution path. With the selector-index fast path,
+routed overhead should stay close to the bare-provider case and avoid
+catalog-size-linear per-request copying for qualified-selector resolution.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| covers the per-request resolution path. This routed path currently allocates an | |
| order of magnitude more per request because resolution re-copies the full model | |
| catalog several times; its guard ceilings should tighten significantly once | |
| resolution is computed once per request and reused. | |
| `BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` + | |
| `ModelRegistry` (the production shape) with a representative catalog, so it | |
| covers the per-request resolution path. With the selector-index fast path, | |
| routed overhead should stay close to the bare-provider case and avoid | |
| catalog-size-linear per-request copying for qualified-selector resolution. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/perf/README.md` around lines 26 - 29, The README.md file contains
outdated performance documentation in the section describing the routed path
(around lines 26-29). The current text still references repeated full-catalog
copies per request, but the actual implementation now uses O(1) selector-index
behavior where resolution is computed once per request and reused. Update the
routed path performance explanation to accurately reflect this current behavior
by removing or revising the outdated statement about order of magnitude
allocations from repeated catalog copies, and instead describe how resolution is
now computed once and reused, which provides the O(1) performance
characteristics referenced in the perf-guard commentary.
CI (linux/amd64) and local (darwin/arm64) produce identical allocation counts and near-identical byte counts, confirming these are deterministic. Tighten the ceilings from "intentionally generous" to ~10% over the measured baseline so the guard catches smaller regressions while still absorbing Go/dependency drift: hot_path: 125 -> 120 allocs (baseline 113) routed: 160 -> 150 allocs, 18->16 KB (baseline 137 / ~14.7 KB) responses_stream: 310 -> 222 allocs, 25->22 KB (baseline 202 / ~19.6 KB) shared_observers: unchanged (already tight, no headroom to trim) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- registry: trim publicName/ProviderType when building the qualified-selector index and skip empty keys, matching the trimmed lookup inputs and the previous catalog scan (which compared trimmed fields on both sides). Prevents the O(1) fast path from missing when provider metadata carries whitespace padding. - resolve_bench_test: build exactly totalModels (round-robin across providers) instead of providersN*(n/6); the models=50/1000 cases previously benchmarked 48/996 models due to integer truncation. Add benchSelector helper. - request_snapshot_test: extend the owned-maps test to assert route/query/trace maps are owned (not cloned) while headers are still defensively cloned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Reduces per-request CPU and allocations on the gateway hot path. All changes are behavior-preserving for valid input; the one intentional difference (goccy input leniency) is documented and pinned by a test. Verified with the full
make test-racesuite,make lint, and the hot-path perf guard (all green via pre-commit hooks).Changes
1. O(1) model resolution
Per request, the router resolved the model selector ~6× and each qualified resolution copied the entire model catalog and linear-scanned it (
ListModelsWithProvider).qualifiedByName/qualifiedByType), built once and cleared at the existing single invalidation point.qualifiedSelectorResolverinterface; the catalog scan remains as a fallback for non-indexed lookups and raw slash-shaped model IDs.resolveQualifiedSelector.Measured: resolution is now O(1) and constant in catalog size.
2. JSON decoding →
goccy/go-jsoninternal/+cmd/fromencoding/jsontogithub.com/goccy/go-json(true drop-in; package is namedjson).gjsonunchanged. Test files intentionally stay onencoding/jsonas a stdlib oracle.gjson.ValidByteswalk inextractUnknownJSONFields(callers already validate via the precedingUnmarshal).Behavior note: goccy is slightly more lenient than stdlib on a couple of malformed inputs (leading-zero numbers; malformed values inside skipped passthrough fields). Accepted under the gateway's "accept generously" (Postel's Law) principle and pinned by
TestDecoderLeniencyIsBounded. All valid input decodes identically.3. Request-snapshot allocations
NewRequestSnapshotWithOwnedMapsso ingress capture owns the freshly-built route/query/trace maps and body, cloning only the live header map.HeadersViewand pointed read-only callers at it.NewRequestSnapshotWithOwnedBodyconstructor.4. Perf harness
server.New, bypassing the Router/registry entirely — so the perf guard protected a path that doesn't exist in production. It now wires the real Router + a populated catalog, with a guard case. Added a resolution micro-benchmark (resolve_bench_test.go).Impact framing
JSON decode and model resolution are a slice of per-request work (the upstream LLM call dominates wall-clock), so this is primarily a throughput / CPU / GC win across every endpoint, not a dramatic per-request latency drop.
Risks / follow-ups
github.com/goccy/go-json(MIT, pure Go, v0.10.6) is now on the core hot path — worth a conscious sign-off.Test plan
make test-race(full suite, 58 packages) — greenmake lint— green🤖 Generated with Claude Code
Summary by CodeRabbit